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ABSTRACT 

Most linguistic models known are essentially static, that is, time is not a parameter in describing the 
behavior of the object’s model. In this paper we show a model for synchronous finite state machines based 
on fuzzy logic. Such finite state machines can be used to build both event-driven time-varying rule-based 
systems and also the control unit section of a fuzzy logic computer. The architecture of a pipelined 
intelligent fuzzy controller is presented, and the linguistic model is represented by an overall fuzzy relation 
stored in a single rule memory. A VLSI integrated circuit implementation of the fuzzy controller is 
suggested. At a clock rate of 30 MHz, the controller can perform 3 MFLIPS on multi-dimensional fuzzy 
data. 


KEYWORDS: Fuzzy Modeling, Intelligent Fuzzy Controller, Fuzzy Logic Hardware 
Accelerator, VLSI Implementation 

1. FUZZY LOGIC FINITE STATE MACHINES 

The general model of a finite state machine (FSM) is illustrated in Figure 1.1. Formally, a sequential 
circuit is specified by two sets of Boolean logic functions: 

f z (X, y) -> Z, and 
fy (X, y) -4 Y, 

where X, Z, y, and Y stand for a finite set of inputs, outputs, present and next slate of the state variables, 
respectively. Functions f z and fy map the inputs and the present states of the state variables to the outputs 
and the next states of the state variables, respectively. 



Figure 1.1 General model of a finite state machine (FSM). 

The current states of the memory elements hold information on the past history of the circuit. The 
behavior of a synchronous sequential circuit can be defined from the knowledge of its signals at discrete 
instants of time. Those time instants are determined by a periodic train of clock pulses. The memory 
elements hold their outputs until the next clock pulse arrives. 


1 Visiting Professor from Technical University of Budapest, Hungary. 
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We extend this model by introducing membership functions and fuzzy relations to map the changes which 
take place in fuzzy input data to fuzzy outputs and next states of the state variables. 

With the model presented in this paper, the definition of states will remain crisp, that is, the state of the 
system can be represented in one of the usual ways (i.e. by isolated flip-flops, registers or a 
microprogrammed control unit). The fuzzy outputs will be devised from a dynamically changing linguistic 
model since the response to a specific change at the fuzzy inputs will vary with different states of the FSM. 
We will refer to this model as Crisp-State-Fuzzy-Output FSM or CSFO FSM. A block diagram of the 
CSFO FSM is shown in Figure 1.2. X and Z stand for a finite set of fuzzy inputs and outputs, 
respectively. 



Z = XoR(y) 
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Figure 1.2 General model of the Crisp-State-Fuzzy-Output FSM (CSFO FSM). 

R stands for the object's model which is now function of the y present states of the state variables, and o is 
the operator of composition. The z c crisp values of the fuzzy outputs are obtained by computing the DF 
defuzzification strategy. B stands for the transformation which maps the linguistic values of the X 
linguistic (fuzzy) variables to the Xb Boolean (two-valued) logic variables. Function f y maps both the Xb 
Boolean logic variables and also the y present states to the Y next states of the state variables. 

To accelerate the mapping of the fuzzy inputs X to a new set of fuzzy and crisp outputs Z and z c , 
respectively, (i.e. to compute fuzzy inference and the DF defuzzification strategy) our pipelined fuzzy logic 
hardware accelerator model [5] will also be employed with the CSFO FSM. The next states of the state 
variables will be devised from the present states and the Xb Boolean logic variables. For instance, a 
Boolean variable X1LOW is true if the position of the maximum in the membership function for linguistic 
variable XI falls in the range 1 to 5. X1LOW is otherwise false. 

The state transients will be completed simultaneously with the fuzzy pre-processing pipeline step (Figure 
3.3). A new Sk state of the CSFO FSM will then select an overall fuzzy relation Rk which will in turn 
be used as the linguistic model in the fuzzy inference pipeline step while the system is in state Sk- With 
this model, the state variables will take their new values at the rate at which the pipeline steps proceed. 
The fuzzy outputs will be defuzzified in the last pipeline step. 

In the course of the learning process (eq. 2.3), an overall relation Rj is created for each state Si (I = 1 N) 

of the CSFO FSM. 

2. ALGORITHM OF CREATING A MULTIPLE-INPUT FUZZY MODEL 

A linguistic model of a process can be built by software; fuzzy inference and defuzzification strategies can 
also be computed without using any dedicated hardware. However, in case of real-time control applications, 
the pure software approach may not be sufficient. We suggest a hardware accelerator for a multiple-input 
fuzzy logic controller. The accelerator is based upon the mathematical model as follows. 

The process operation control strategy is created by analysis of input and output values, in which not only 
measurable quantities are taken into account but also parameters which cannot be measured, only observed 
[1]. On the basis of the verbal description, which is called a linguistic model, a fuzzy relation R is created: 

N 

R = *(XI->YI). (2.1) 

1=1 
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In formula (2.1) -» is a symbol of the operation or operations by which fuzzy implications are defined, and 
the symbol * represents an operation which interprets the sentence connective ALSO. 

We shall present the algorithm not only intended for creating a fuzzy model with a given verbal description 
is given, but also for determining the model's answer to a given input [2]. 

The verbal description of the process performance contains N relations, and fuzzy sets describe the particular 
states which occur in the verbal description of inputs and and output Y be given in 
formula (2.2). The graphic interpretations [4] of fuzzy sets X^\ and Y are illustrated in Figure 2.1. 


Rl: EFX^ is very small (x^l) AND X^ is medium (x^l) THEN Y is medium 

(Yl) 

ALSO (2.2) 

RN: IF X (1) is very big (x (1) n) AND X (2) is medium (x (2) n) THEN Y is medium 
(YN) 


The paragraphs below illustrate in turn: 

Fuzzy Learning 

A method of creating fuzzy relation Rl which represents the first fuzzy implication in the verbal description 
is interpreted as intersection. The remaining relations R2, R3,..., RN are created analogously by 
application of the same definition of fuzzy implication. 

Rl = XlxYl 

V(u,w)eUxW Rl(u,w) = min (XI (u), Yl(w)) (2.3) 

Vue U Xl(u) = min(X (1) (u), X (2) (u)) 

The final relation R (being the object's model) is obtained as the union of Rl, R2,..„ RN, since the 
sentence connective ALSO is defined as union. 

R = RluR2u...uRN 

V(u,w)eUxW R(u,w) = max(Rl(u,w), R2(u,w),..., RN(u,w)) (2.4) 





Figure 2.1. Graphic interpretations of fuzzy sets X 


(1) , x (2) 


, and Y. 
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(2.5) 


( 2 . 6 ) 


The hardware accelerator which performs the fuzzy learning, fuzzy inference, and defuzzification 
computation, that is, which maps the fuzzy inputs to fuzzy and/or crisp outputs, is summarized in this 
section. 

Currently, in our research the degree of membership function is a discrete valued function with a 5-element 
domain set. With two-valued logic, three bits are used to represent each element of the set. The number of 
levels can be extended up to eight. The universe of discourse of a fuzzy subset is limited to a finite set with 
25 elements (u max = w max = 25). Seventy five bits are used for digitization of the membership function. 
The accelerator consists of four basic units: the host interface, the fuzzy pre-processing unit, the combined 
fuzzy model/fuzzy inference unit, and the defuzzifier unit. The last two are referred to as the fuzzy engine 
[3], The functional block diagram of the accelerator is shown in Fig. 3.1. To achieve a high processing rate 
for real time applications, the units are connected in a four-level pipeline. 



Figure 3.1. Pipeline architecture of the hardware accelerator. 
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The core of the hardware accelerator is a fuzzy engine which implements the formuli (eq. 2.4) to (eq. 2.6). 
It is split into the fuzzy model/fuzzy inference unit and the defuzzifier unit. The functional block diagram 
of the fuzzy model/fuzzy inference unit (without increased parallelism) is shown in Figure 3.2. 

After the XI and YI registers have been loaded, learning a multi-dimensional rule RK takes Umax clock 
periods. The MUX2 multiplexer at the input of the minimum unit selects the YI register. During the first 
clock step, ui is paired with all w elements of YI and these pairs are fed to the inputs of the minimum unit. 
If the current rule is the first in a learning sequence, throughout the learning cycle 0 (non membership) 
elements will be paired with the outputs of the minimum unit and fed to the inputs of the maximum unit. 
The whole word of maximum values is stored at the first location of the R rule memory. During the jth 
clock step, uj is compared to all w elements of YI simultaneously and the vector of the max elements is 
stored in the jth location of R. 

If the current rule is not the first one in the learning process, the MUX3 multiplexer at the input of the 
maximum unit selects the ith row of R(1 < i < u ma x) during the ith clock step and the contents of this row 
in R will be updated from the outputs of the maximum unit. 



Figure 3.2 Functional block diagram of the fuzzy model/fuzzy inference unit. 

Therefore the learning process of N rules takes Nxu ma * clock periods with the architecture shown in Figure 
3.2. The clock steps needed to load registers XI and YI are ignored at this point. Computing the 

fuzzy inference (max-min composition) also takes u m ax clock periods. This time the MUX2 multiplexer at 
the input of the minimum unit pairs the uj element of XI with all r elements of the ith row in R. If i = 1 
(first clock step), then the MUX3 multiplexer at the input of the maximum unit selects 0 as the other 
operand for each element at the output of the minimum unit. The outputs of the maximum unit are fed to 
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the inputs of the Y register. From the second to the last clock steps, outputs of the Y register are fed back 
to the inputs of the maximum unit through the MUX3 multiplexer. Contents of the R rule memory 
remain unchanged during the fuzzy inference process. After the last clock step, register Y holds the result of 
the XoR operation in the digitized fuzzy data format. 

To detect whether the condition: V(u,w)e UxW, R(u,w) = 1 is met, an error flag was added to the fuzzy 
engine. If the error flag is activated at the completion of the learning of a new rule, then all elements in the 
R rule memory equal 1 (full membership). This flag can be used to generate an interrupt request to the host 
machine. The system can then recover from this erroneous state by either downloading a "safe" model to 
the R memory or starting over the learning process with a modified model. 

Due to the linear property of the max-min composition, by quadrupling the functional units of the basic 
architecture, the time required to complete the pipeline steps for either the fuzzy learning or the fuzzy 
inference process can be reduced to [u m ax + 4]+ 2 clock periods. 

Since the precedence relation of the subtasks (I/O data transfer (T i), the pre-processing of the multiple fuzzy 
inputs (T 2 ), the learning of a new rule or the performing a fuzzy inference operation (T 3 ), and the 
defuzzification (T 4 )) are all linear operations, the four basic units of the hardware accelerator form a linear 
pipeline. The pipeline architecture allows the simultaneous operation of the four units. The space-time 
diagram in Figure 3.3 illustrates the overlapped operations of the pipeline units. Assuming that the 
downloading of the fuzzy data from the host system to the accelerator and the reading of the fuzzy and/or 
crisp output data from there (subtask Ti) does not exceed [u max +4]+2 (9) clock periods, the accelerator 
produces new fuzzy and/or crisp output data every [u max +4]+2 clock periods once the pipeline is filled. 
Thus, at a clock rate of 30 MHz the fuzzy engine can perform over 3,000,000 fuzzy logical inferences per 
second with the current fuzzy data format 
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IF : interface unit 
PP : pre-processing unit 
MI : model/inference unit 
DF : defuzzifier unit 

Figure 3.3. Overlapped operations of the pipeline units. 

4. VLSI IMPLEMENTATION 

One of the most difficult issues coming from the practical realization is associated with the VLSI 
implementation, therefore the information provided in this section are based on our estimates and previous 
experience with projects of a similar nature. 

Due to our objective constraints, i.e. the MOSIS service is available for chip fabrication at this time, the 
full design version of the proposed controller will be designed, along with a scaled-down version which will 
pass the constraints and will finally be fabricated. 

There are two different versions of the fuzzy logic controller that could be useful in most practical 
implementations: a controller working stand alone (S A) or with an appropriate host computer (HC). These 
options will be taken into consideration. 

Let us discuss the VLSI implementation issues in more detail starting with the full scale design. According 
to our preliminary assumption we come up with the descriptions of design signals which are summarized in 
Table 1 . 
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Full scale HC version 

Full scale SA version 

64b scaled address/data bus 

64b digital address/data bus (2> 

-55 signal for data bus control * 

32b XPROM interface 

(Bus Parity), (Command) 

4 XPROM control signals 

(Status and CP), (Capability) 

4 analog inputs % 

(Synchronization) 

3 signals for fuzzification control # 

wenaHHHMHii 

3 signal for inference control # 


2 signals for defuzzification control # 

3 signals for fuzzification control # 

4 signals for mode control 

3 signals for inference control # 

2 or 3 analog outputs 

3 signals for defuzzification control # 

1 CLK global clock 

1 CLK global clock 

1 STB strobe signal 

1 RESET input 

1 EN synchronization input 

1 CS chip select 

1 RESET input 


-8 power supply inputs 


* We are currently working on the data bus control so this number can be changed. 

# These options could be programmable. 

% This number is a subject of investigation and can be changed. 

@ Can be used to substitute for a single analog input/output. 

Table 1. Preliminary definitions of signals for full scale versions of HC and SA Fuzzy Logic Controller. 

We assume that the proposed fuzzy controller will have three basic cycles of operation: fuzzy learning, 
fuzzy inference and stand-by. In case of the fuzzy learning and fuzzy inference operations the HC version 
will be supplied with fuzzy data through the host computer which performs the fuzzification of the analog 
inputs. It is obvious that HC version will be able to process only digital representation of the fuzzy data 
prepared by the host computer. In our first approach this version will not be cascadable. The SA version of 
the chip will input the analog data and perform the fuzzification operation by itself. The stand-by mode will 
be common for both versions. 



Fig. 4.1. Configuration for the Hardware Accelerator working under the host computer (HC version). 
One can also see that the HC version will require a very detailed design of the interface to the bus system 
used by the host computer for data transmission while the S A version will need an A/D converter and a few 
D/A converter, which will be included in the chip design. 

It is assumed that such a version will be communicating with the host computer through FUTUREBUS 
(Fig. 4.1). 

Both versions has its own advantages and disadvantages basically due to communications issues, the 
number of pins and the design effort. It is important to point out that one can expect some instant 
differences in the performance of the four versions which will further be investigated in detail. 


318 




























Let us now focus on the scaled down implementations of the SA version of the proposed fuzzy logic 
controller. There will be designed two basic modes for chip operation: normal and programmed. The normal 
mode will include cascaded (parallel or serial) and non cascaded operation. The block diagrams illustrating 
these modes are shown in Figs. 4.2 and 4.3. 

Analog Analog Analog 

Outputs Outputs Outputs 



Figure. 4.2. Serial configuration for Fuzzy Logic Controller (SA version). 


Sync 



Clock 


Analog 

Inputs 


Figure. 4. 3. Parallel configuration for Fuzzy Logic Controller (SA version). 

In the programming mode, we assume that it will be possible to preprogram the fuzzifier, defuzzifier or 
inference engine, or any combination of these, in order to preserve a flexible operation tuned to the actual 
user. In order to achieve programmability, an EPROM/EEPROM type of memory block will be built-into 
the chip, and will be controlled by the external source through a memory I/O port and control signals. Our 
decision to fabricate this version is based on both the number of pins and also the number of signals needed 
to implement this version (no data interface is needed). The scaled down implementation of our design 
matching the objective constraints is presented as follows. 

* Chip size and package 

The reasonable MOSIS package has 132 pins and can contain a chip occupying 7.8mm*9.2mm of silicon 
area (max). The choice of CMOS technology leads us to the variety of available processes starting from 
iambda=l\im to lambda=0.6\im. Keeping in mind the maximum signal frequency for chip operation, which 
was originally set between 25MHz and 30MHz, as well as the maximum chip size, the n-well, double- 
metal CMOS technology with lambda=0.6pm will be adequate to achieve the design goals. 
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• Chip area and number of transistors 

The maximum chip area of 72.68mm 2 (132 pins package) can contain about 600000 transistors for highly 
regular structure 2 with the standard CMOS technology (Jambda-Q.6\im). According to our estimations we 
will be able to put at most as four parallel fuzzy data processing paths into the chip. The single data 
processing path including the programming options (memoryless) is estimated to have about 50,000 
transistors.. 

• Clock strategy and clock distribution 

We decided to use single external clock signal (CLK) to generate an on-chip, two-phase non overlapping 
internal clock signals (<hl and <t>2), the two phase-clock system having the advantage of making hazard 
problems within the pipeline paths more easily identifiable. These phases will be distributed over the whole 
chip using a second metallization layer. Because the longest possible metal line is about 10mm, we chose a 
tree-like structure for phase distributions driven by high gain clock drivers. These drivers will be designated 
to drive an appropriate capacitive load of the whole clock line tree. According to the results of our previous 
research, the single processing path will have the ability to operate at 8 clock cycles/pipeline step. Setting 
the external clock rate at around 30MHz will enable us to operate the processor at a high processing rate. A 
future detailed investigation will help us to determine the highest possible clock rate. 

• Rule memory 

The major problem with the limited capacity of the internal SRAM (for HC version) of EPROM/EEPROM 
(for SA version) memory for the storage of rules of inferences in previous works [6-8J does not exist in our 
approach due to the strategy of building the global rule for the whole linguistic model described in Chapter 
2. In our case only 1/4 Kbyte SRAM or EPROM/EEPROM is needed to store the global rule. Such an 
approach creates a luxury of increasing the parallelism of the internal structure by a factor of four, which is 
discussed in the next section. 

It also should be noted that the idea of CSFO FSM is intended to be implemented in the SA version. 
Furthermore, the required extension of the rule memory (every FSM state will have assigned rule memory) 
will be evaluated. It is however unlikely that overall number of transistors for a single path will reach 
100,000 transistors. 

• Pipeline architecture 

The estimated number of transistors for a single fuzzy data processing path is around 50,000. This means 
that the chip under consideration has the capacity containing at least four separate fuzzy data processing 
paths plus rule memory, which gives total estimation of about 300,000 transistors (look-up table used for 
defuzzification is included). The estimated area occupied by transistors is about 45 mm 2 . The rest of the 
chip area will be used to provide high speed communication between processing units and the built-in 
memory (EPROM/EEPROM). It is expected that four parallel data paths will be designed in the chip 
increasing the actual speed of operation twice. In the proposed design, 3 MFL1PS performance is expected 
assuming the clock rate will be up to 30MHz. 

5. CONCLUSIONS 

The paper describes the general model for fuzzy state machine (FSM) which is used to formulate the fuzzy 
controller for event-driven real-time systems. As a result the improved architecture for fuzzy logic controller 
has been defined. 

The improvement with respect to already published architectures [5-9] comprises in a novel strategy for 
fuzzy model building, which enables fuzzy inferences to be performed in a single stage of a hardware 
accelerator. As it has been estimated the proposed architecture, appropriately pipelined, for the hardware 
accelerator will profit in reaching at least 3M fuzzy logical operations per second. 

The presented approach can be utilized for fuzzy controller hardware accelerators intended to work in the real 
time environment. 


2 Excluding the area occupied by the chip frame. 
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